19 research outputs found
Indoor Localization Using Radio, Vision and Audio Sensors: Real-Life Data Validation and Discussion
This paper investigates indoor localization methods using radio, vision, and
audio sensors, respectively, in the same environment. The evaluation is based
on state-of-the-art algorithms and uses a real-life dataset. More specifically,
we evaluate a machine learning algorithm for radio-based localization with
massive MIMO technology, an ORB-SLAM3 algorithm for vision-based localization
with an RGB-D camera, and an SFS2 algorithm for audio-based localization with
microphone arrays. Aspects including localization accuracy, reliability,
calibration requirements, and potential system complexity are discussed to
analyze the advantages and limitations of using different sensors for indoor
localization tasks. The results can serve as a guideline and basis for further
development of robust and high-precision multi-sensory localization systems,
e.g., through sensor fusion and context and environment-aware adaptation.Comment: 6 pages, 6 figure
Affine Invariants of Planar Sets
Recent research has indicated that invariants can be useful in computer vision for identification and pose determination of objects. The idea is to find functions that are invariant under a set of transformations acting on a configuration space. This paper describes some new viewpoints on the construction and use of such invariants. The key idea is that any kind of features like derivatives, distinguished points or integrative features can be used to construct invariants. In a given viewing situation one should choose those features that are most stable. As examples, affine invariants for planar smooth curves, planar regions, and planar point configurations are given. The properties of the invariants are illustrated with experiments. 1 Introduction Identification and pose determination of threedimensional objects are key problems in computer vision. It is surprising that even in rather complicated situations it is possible to solve them using only one single two dimensional view and a..
Critical Configurations for N-view Projective Reconstruction
In this paper we give a characterization of critical configurations for projective reconstruction with any number of points and views. A set of cameras and points is said to be critical if the projected image points are insufficient to determine the placement of the points and the cameras uniquely, up to a projective transformation. For two views, the critical configurations are well-known. In this paper it is shown that a configuration of n 3 cameras and m points all lying on the intersection of two distinct ruled quadrics is critical. In distinction to the two-view case, which in general allows two alternative solutions, there is a family of ambiguous reconstructions for the n-view case. As a partial converse, it is shown that for any critical configuration, all the points lie on the intersection of two ruled quadrics
Parameterisation invariant statistical shape models
In this paper novel theory to automate shape modelling is described. The main idea is to develop a theory that is intrinsically defined for curves as opposed to a finite sample of points along the curves. The major problem here is to define shape variation in a way that is invariant to curve parameterisations. Instead of representing continous curves using landmarks, the problem is treated analytically and numerical approximations are introduced at the latest stage. The problem is solved by calculating the covariance matrix of the shapes using a scalar product that is invariant to global reparameterisations. An algorithm for implementing the ideas is proposed and it is compared to a state of the art algorithm of automatic shape modelling. The problems with stability in former formulations are solved and the resulting models are of higher quality.
Collaborative merging of radio SLAM maps in view of crowd-sourced data acquisition and big data
Indoor localization and navigation is a much researched and difficult problem. The best solutions, usually use expensive specialized equipment and/or prior calibration of some form. To the average person with smart or Internet-Of-Things devices, these solutions are not feasible, particularly in large scales. With hardware advancements making Ultra-Wideband devices more accurate and low powered, this unlocks the potential of having such devices in commonplace around factories and homes, enabling an alternative method of navigation. Therefore, indoor anchor calibration becomes a key problem in order to implement these devices efficiently and effectively. In this paper, we present a method to fuse radio SLAM (also known as Time-Of-Arrival self-calibration) maps together in a linear way. In doing so we are then able to collaboratively calibrate the anchor positions in 3D to native precision of the devices. Furthermore, we introduce an automatic scheme to determine which of the maps are best to use to further improve the anchor calibration and its robustness but also show which maps could be discarded. Additionally, when a map is fused in a linear way, it is a very computationally cheap process and produces a reasonable map which is required to push for crowd-sourced data acquisition
Bootstrapped Representation Learning for Skeleton-Based Action Recognition
In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60, NTU-120 and PKU-MMD datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on linear evaluation, semi-supervised and transfer learning benchmarks
Optimal Trilateration Is an Eigenvalue Problem
The problem of estimating receiver or sender node positions from measured receiver-sender distances is a key issue in different applications such as microphone array calibration, radio antenna array calibration, mapping and positioning using UWB or using round-trip-time measurements between mobile phones and WiFi-units. In this paper we address the problem of optimally estimating a receiver position given a number of distance measurements to known sender positions, so called trilateration. We show that this problem can be rephrased as an eigenvalue problem. We also address different error models and the multilateration setting where an additional offset is also unknown, and show that these problems can be modeled using the same framework
Parameterization of Ambiguity in Monocular Depth Prediction
Monocular depth estimation is a highly challenging problem that is often addressed with deep neural networks. While these use recognition of high level image features to predict reasonably looking depth maps,the result often has poor metric accuracy. Moreover,the standard feed forward architecture does not allow modification of the prediction based on cues other than the image.In this paper we relax the monocular depth estimation task by proposing a network that allows us to complement image features with a set of auxiliary variables. These allow disambiguation when image features are not enough to accurately pinpoint the exact depth map and can be thought of as a low dimensional parameterization of the surfaces that are reasonable monocular predictions. By searching the parameterization we can combine monocular estimation with traditional photoconsistency or geometry based methods to achieve both visually appealing and metrically accurate surface estimations. Since we relax the problem we are able to work with smaller networks than current architectures. In addition we design a self-supervised training scheme,eliminating the need for ground truth image depth-map pairs. Our experimental evaluation shows that our method generates more accurate depth maps and generalizes better than competing state-of-the-art approaches
Trilateration Using Motion Models
In this paper, we present a framework for doing localization from distance measurements, given an estimate of the local motion. We show how we can register the local motion of a receiver, to a global coordinate system, using trilateration of given distance measurements from the receivers to senders in known positions. We describe how many different motion models can be formulated within the same type of registration framework, by only changing the transformation group. The registration is based on a test and hypothesis framework, such as RANSAC, and we present novel and fast minimal solvers that can be used to bootstrap such methods. The system is tested on both synthetic and real data with promising results
Accurate Indoor Positioning Based on Learned Absolute and Relative Models
To improve the accuracy of indoor positioning systems it can be useful to combine different types of sensor data. This paper describes deep learning methods both for estimating absolute positions and for performing pedestrian dead reckoning, and then how to combine the resulting estimates using weighted least squares optimization. The positioning model is based on a custom neural network which uses measurements of received signal strength indication from one instant of time as input. The model for estimating relative positions is on the other hand based on inertial sensors, the accelerometer, magnetometer and gyroscope. The position estimates are then combined using a least squares approach with weights based on the standard deviations of errors in predictions from the used models